Sparse, Dense, and Attentional Representations for Text Retrieval

نویسندگان

چکیده

Abstract Dual encoders perform retrieval by encoding documents and queries into dense low-dimensional vectors, scoring each document its inner product with the query. We investigate capacity of this architecture relative to sparse bag-of-words models attentional neural networks. Using both theoretical empirical analysis, we establish connections between dimension, margin gold lower-ranked documents, length, suggesting limitations in fixed-length encodings support precise long documents. Building on these insights, propose a simple model that combines efficiency dual some expressiveness more costly architectures, explore sparse-dense hybrids capitalize precision retrieval. These outperform strong alternatives large-scale

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sparse representations for text categorization

Sparse representations (SRs) are often used to characterize a test signal using few support training examples, and allow the number of supports to be adapted to the specific signal being categorized. Given the good performance of SRs compared to other classifiers for both image classification and phonetic classification, in this paper, we extended the use of SRs for text classification, a metho...

متن کامل

Boosting Sparse Representations for Image Retrieval

In this thesis, we developed and implemented a method for creating sparse representations of real images for image retrieval. Feature selection occurs both offline by choosing highly selective features and online via “boosting”. A tree of repeated filtering with simple kernels is used to compute the initial set of features. A lower dimensional representation is then found by selecting the most ...

متن کامل

Subspace Clustering Reloaded: Sparse vs. Dense Representations

State-of-the-art methods for learning unions of subspaces from a collection of data leverage sparsity to form representations of each vector in the dataset with respect to the remaining vectors in the dataset. The resulting sparse representations can be used to form a subspace affinity matrix to cluster the data into their respective subspaces. While sparsity-driven methods for subspace cluster...

متن کامل

Dense Mapping for Range Sensors: Efficient Algorithms and Sparse Representations

This paper focuses on efficient occupancy grid building based on wavelet occupancy grids, a new sparse grid representation and on a new update algorithm for range sensors. The update algorithm takes advantage of the natural multiscale properties of the wavelet expansion to update only parts of the environement that are modified by the sensor measurements and at the proper scale. The sparse wave...

متن کامل

Combining Text Vector Representations for Information Retrieval

This paper suggests a novel representation for documents that is intended to improve precision. This representation is generated by combining two central techniques: Random Indexing; and Holographic Reduced Representations (HRRs). Random indexing uses co-occurrence information among words to generate semantic context vectors that are the sum of randomly generated term identity vectors. HRRs are...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Transactions of the Association for Computational Linguistics

سال: 2021

ISSN: ['2307-387X']

DOI: https://doi.org/10.1162/tacl_a_00369